Method and system for reconciling meta-data in a data warehouse

ABSTRACT

A data warehouse system comprising a warehouse database and a data warehouse manager communicatively coupled to the database. The warehouse manager is to store data to and retrieve data from the warehouse database and perform metadata reconciliation to update metadata of objects in the data warehouse to reflect a change in metadata of a first object in the warehouse by identifying a first set of objects impacted by the change to the metadata of the first object, identifying a second set of objects on which the first object depends, identifying at least one task that has to be completed to propagate the impact of the change in the metadata of the first object to objects in the first and second sets of objects, performing the task, and dynamically computing additional tasks based on whether or not definitions of objects in the first or second sets of objects changes based upon completion of the at least one task.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of application Ser. No. 12/908,565,filed Oct. 20, 2010, which is a continuation of application Ser. No.10/849,425, filed May 18, 2004, which application claims the benefit ofU.S. Provisional Application No. 60/500,448, filed Sep. 5, 2003.Application Ser. No. 10/849,425 is also a continuation-in-part of U.S.patent application Ser. No. 10/293,627, filed Nov. 12, 2002. Thedisclosures of 60/500,448 and Ser. No. 10/293,627 are hereinincorporated herein by reference in their entirety for all purposes.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSOREDRESEARCH OR DEVELOPMENT

NOT APPLICABLE

REFERENCE TO A “SEQUENCE LISTING,” A TABLE, OR A COMPUTER PROGRAMLISTING APPENDIX SUBMITTED ON A COMPACT DISK

NOT APPLICABLE

BACKGROUND OF THE INVENTION

The present invention relates to information management systems and,more particularly, to a method and system for performing metadatareconciliation in a data warehousing environment.

A data warehouse is a centralized collection of data. Data warehousesare ideally suited for supporting management decision-making in businessorganizations since data from disparate and/or distributed sources maybe stored and analyzed at a central location. For example, a financialservices organization may store and aggregate in a data warehouse largeamounts of financial data obtained from its regional office databasesaround the world. Various analytical and reporting tools may then beused to process the aggregated data to present a coherent picture ofbusiness conditions at a particular point in time, and thereby supportmanagement decision making of the financial services organization.

Data warehouses are typically implemented on a Database ManagementSystem (DBMS) that includes a large database for storing the data, adatabase server for processing queries against the database and one ormore database applications for accessing the DBMS. The types ofapplications that are provided for a data warehouse vary widely,depending upon the requirements of a particular implementation. Forexample, a data warehouse may include an application for configuring thedatabase schema used for the data warehouse database. As anotherexample, a data warehouse may include an application for extracting datafrom source databases and then storing the extracted data in the datawarehouse. A data warehouse may also include an application forgenerating reports based upon data contained in the data warehouse.

Building a data warehouse is not an easy task. And, often times, after adata warehouse is well designed and built, it is desirable to introducechanges to the warehouse for any of a variety of reasons. A few examplesof changes that impact a data warehouse design include a new column in asource table utilized in the warehouse, a new requirement on theinformation stored in the warehouse or a new query on the data warehouseto name a few. When such changes are present, the data warehouse designbecomes out of sync and requires fixes. The process of fixing ormodifying a data warehouse design so it is in sync with the new changesis called reconciliation and focuses on reconciling metadata.

As used herein, the term “metadata” refers generally to data thatdefines other data. In the context of data warehousing, the term“metadata” refers to data that defines data that is stored in a sourcedatabase or in a data warehouse. For example, in the context of datawarehousing, metadata may include the database schema used in a sourcedatabase or in a data warehouse. As described in more detailhereinafter, metadata may define not only the final data that is storedin the data warehouse, but also intermediate data and structures, suchas staging tables that are used to determine the final data.

The term “metadata reconciliation” generally refers to updating themetadata of an object to reflect changes made to the metadata of anotherobject. The need for metadata reconciliation arises when there is arelationship between the two objects and the relationship is affected bythe change to the metadata of one of the objects. For example, supposethat the definition (metadata) of object B depends upon the definition(metadata) of object A. A change to the metadata of object A disruptsthe relationship between objects A and B. Metadata reconciliation may beused to update the metadata for object B to reflect the change made tothe metadata for object A.

In the context of data warehousing, metadata reconciliation refers toupdating the metadata for a data warehouse to reflect changes made tothe definition of an object that one or more other objects in the datawarehouse rely upon. The object may be external to the data warehouse,e.g., a source database object, or internal to the data warehouse. Forexample, changing the design of a data warehouse may change thedefinition of an object within the data warehouse, such as anintermediate table. When the definition of an object changes, themetadata for a data warehouse must be updated to reflect the changesmade to the definition of the object before additional data can beimported into the data warehouse. Otherwise, attempting to store newdata that conforms to the updated definition will cause errors in thedata warehouse.

For example, the database schema used in a data warehouse is designedbased upon the database schemas that define the corresponding sourcedatabases. Each source database may have its own (different) databaseschema and all of the source data must be aggregated onto a singledatabase in the data warehouse. To achieve this result, a data warehousetypically includes a transformation mechanism, usually in the form of adata warehouse software application or module that is configured totransform source data that conforms to the database schema used in thesource databases, into data that conforms to the database schema used inthe data warehouse. The extent of transformation that must be performedgenerally varies depending upon size of the data warehousing deploymentand differences between the source database schemas and the databaseschema used in the data warehouse. For example, in data warehousingdeployments with large numbers of disparate source databases, thetransformation process can be very complex.

Data warehouses are conventionally maintained manually by warehousedesigners who, in response to a change made to a source database schemaor the design of the data warehouse, must first identify the datawarehouse objects, such as tables, transformation mechanisms andapplications affected by the change, including any temporary objects andtables, and then update these temporary objects, tables, applicationsand the data warehouse schema to reflect the change made to the sourcedatabase schema or the design of the data warehouse. Updating thedatabase schema of the data warehouse can be very complex and require asignificant amount of human resources, even just to identify the changesin metadata that must be made. For example, a warehouse designer mayhave to manually inspect a large number of database schema objects,including objects involved in the transformations, to determine which ofthese objects rely upon the objects that have been modified. Thisproblem may be compounded when, because of complex data dependencies inthe data warehouse, changes made to the database schema of the datawarehouse affect other database schema objects that then must also beupdated. Thus, changes to a database schema of a source database, or tothe design of a data warehouse, that initially appear to directly affectonly a few database schema objects in a data warehouse may in practiceindirectly affect large numbers of database schema objects in the datawarehouse. Hence, even seemingly small changes to the schema of sourcedatabases can require a significant amount of human resources toreconcile data warehouse metadata.

While a variety of tools have been created to facilitate the datawarehouse reconciliation task, there is still a need for improvements inthe reconciliation process.

BRIEF SUMMARY OF THE INVENTION

Embodiments of the invention pertain to a system and method forreconciling a data warehouse in which a change in the definition ormetadata of a warehouse object has occurred.

According to one embodiment of the invention a data warehouse systemcomprising a warehouse database and a data warehouse managercommunicatively coupled to the database is provided. The warehousemanager is to store data to and retrieve data from the warehousedatabase and perform metadata reconciliation to update metadata ofobjects in the data warehouse to reflect a change in metadata of a firstobject in the warehouse by identifying a first set of objects impactedby the change to the metadata of the first object, identifying a secondset of objects on which the first object depends, identifying at leastone task that has to be completed to propagate the impact of the changein the metadata of the first object to objects in the first and secondsets of objects, performing the task, and dynamically computingadditional tasks based on whether or not definitions of objects in thefirst or second sets of objects changes based upon completion of the atleast one task.

According to another embodiment of the invention a method for managing adata warehouse is disclosed. The method includes the steps of allowing auser to select an object in the data warehouse and identify a change inthe definition of the selected object; identifying a first set ofobjects impacted by the change to the definition of the selected object;identifying a second set of objects on which the selected objectdepends; identifying at least one task that has to be completed topropagate the impact of the change in the definition of the selectedobject to objects in the first and second sets of objects; performingthe task; and dynamically computing additional tasks based on whether ornot definitions of objects in the first or second sets of objectschanges based upon completion of the at least one task.

These and other embodiments of the invention along with many of itsadvantages and features are described in more detail in conjunction withthe text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that depicts a data warehouse arrangementaccording to one embodiment of the invention;

FIG. 2A is a block diagram depicting dependency relationships amongobjects in an impact analysis example;

FIG. 2B is a block diagram depicting dependency relationships amongobjects in a lineage example;

FIG. 3 is a block diagram that depicts a structural dependency modelaccording to one embodiment of the invention;

FIGS. 4( a)-(c) are flow charts that depict the metadata reconciliationprocess according to one embodiment of the invention;

FIG. 5 is a graph depicting the dependencies in a portion of anexemplary data warehouse according to an example implementation of oneembodiment of the invention;

FIGS. 6( a)-(m) are screen shots depicting a GUI and sequence of eventsaccording to one exemplary implementation of the invention; and

FIG. 7 is a simplified block diagram that illustrates a computer systemupon which an embodiment of the invention may be implemented.

DETAILED DESCRIPTION OF THE INVENTION

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the present invention. It will be apparent, however, toone skilled in the art that the present invention may be practicedwithout these specific details. In other instances, well-knownstructures and devices are shown in block diagram form in order to avoidunnecessarily obscuring the present invention.

FIG. 1 is a block diagram that depicts a data warehousing arrangement orsystem 100 according to an embodiment of the invention. System 100includes source databases 102, 104, 106, a data warehouse 108 and awarehouse manager 110. Source databases 102, 104, 106 arecommunicatively coupled to warehouse manager 110 via links 112, 114,116, respectively. Warehouse manager 110 is communicatively coupled todata warehouse 108 via a link 118.

Source databases 102, 104, 106 may be part of one or more databasemanagement systems, e.g., with database servers, that are not depictedin the figures nor described herein for purposes of explanation. Sourcedatabases 102, 104, 106 may contain any type of data in any form and theinvention is not limited to any particular type or form of data. Also,embodiments of the invention are applicable to any number of datasources that may be of the same or different types.

Data warehouse 108 includes a database 120 and a database server 122.Database 120 stores data and database server 122 is configured toprocess queries against database 120. For purposes of explanation,embodiments of the invention are depicted in the figures and describedin the context of a single data warehouse 108 having a single database120 and a single database server 122. The invention, however, is notlimited to this example arrangement and embodiments of the invention areapplicable to data warehousing arrangements with any number of datawarehouses, each having any number of databases and database servers.Furthermore, data warehouse 108 may be configured with other componentsthat are not depicted in the figures or described herein, depending uponthe requirements of a particular application.

For purposes of explanation, links 112, 114, 116 are depicted in FIG. 1as direct connections. Links 112, 114, 116, may however, be implementedby any medium or mechanism that provides for the exchange of databetween source databases 102, 104, 106 and data warehouse 108,respectively. Examples of links 112, 114, 116 include, withoutlimitation, a network such as a Local Area Network (LAN), Wide AreaNetwork (WAN), Ethernet or the Internet, or one or more terrestrial,satellite or wireless links. Furthermore, depending upon a particularimplementation, links 112, 114, 116 may not be physical connections, butrather routine calls or any other mechanisms used by computer softwareroutines to exchange information.

Source databases 102, 104, 106, data warehouse 108 and warehouse manager110 may be located on separate nodes, for example in a distributedcomputing environment, or co-located on a single node, depending uponthe requirements of a particular application. The invention is notlimited to any particular physical arrangement of source databases 102,104, 106, data warehouse 108 and warehouse manager 110.

Warehouse manager 110 generally manages data warehouse 108 and mayperform a variety of functions, depending upon the requirements of aparticular application. Example functions include, without limitation,configuring data warehouse 108, receiving data from source databases102, 104, 106, storing data to and retrieving data from data warehouse108, reconciling data warehouse 108 and generating reports based upondata stored in data warehouse 108.

In one embodiment, warehouse manager 110 includes a data retrieval andstorage application 140, a transformation application 142, a dependencymanagement application 144 and a reconciliation task managementapplication 146 (hereinafter referred to as task management application146). Warehouse manager 110 and its constituent components, i.e., dataretrieval and storage application 140, transformation application,dependency management application 144 and task management application146 may be implemented in computer hardware, computer software, or anycombination of computer hardware or software and the invention is notlimited to any particular implementation.

Data retrieval and storage application 140 is configured to retrievedata, e.g., objects 124, 126, 128, from source databases 102, 104, 106.Data retrieval and storage application 140 is also configured to storeand retrieve data from data warehouse 108. For example, data retrievaland storage application 140 is configured to store and retrieve object130 to and from data warehouse 108.

Transformation application 142 is configured to perform transformationsof data including aggregating data, deleting data and adding data amongother transformations. For example, transformation application 142 isconfigured to transform one or more of objects 124, 126, 128 thatconform to the database schemas for source databases 102, 104, 106,respectively, into object 130 that conforms to the database schema fordata warehouse 108. For example, in the context where objects 124, 126,128 and object 130 are database tables, respectively, transformationapplication 142 may be configured to combine the three database tables124, 126, 128 to create a single database table 130. This may requirecombining rows of data, deleting rows of data, or adding rows of data tocreate the target database table. As an example, suppose that objects124, 126, 128 each include address data, and more specifically, object124 includes four lines of address data and objects 126, 128 eachinclude three lines of address data. Suppose further that the databaseschema for data warehouse 108 specifies that object 130 is to have threelines of address data. In this situation, transformation application 142processes the four lines of address data contained in object 124 tocreate three lines of address data that can be aggregated with theaddress data from objects 126, 128 to conform to the database schema fordata warehouse 108. For example, transformation application 142 maycombine two of the four address lines into a single address line tocreate the necessary three address lines. Once the transformation iscomplete, data retrieval and storage application 140 may be used tostore transformed data from the source databases to data warehouse 108.

Dependency management application 144 is configured to managedependencies among objects. Objects are conventionally defined bymetadata that specifies the attributes of an object. In this context,dependency management application 144 is configured to perform metadatadependency management. This involves performing, in response to a changeto the definition of an object, dependency analysis to identify otherobjects affected by the change. For example, suppose that the databaseschema that defines object 124 is changed. In this situation, dependencymanagement application 144 analyzes the metadata for object 130 todetermine whether object 130 is dependent upon the database schema forobject 124 and is affected by the change to the database schema forobject 124.

Dependency management application 144 is further configured to updatemetadata for the other objects to reflect changes made to the definitionof the object and to update transformation application 142 to reflectchanges made to the definition of an object so that transformationapplication 142 can generate the other objects from new data thatconforms to the changed definition of the object. In the example justprovided, dependency management application 144 is configured to updatethe metadata that defines object 130 to reflect the change made to thedatabase schema that defines object 124 and to update transformationapplication 142 so that transformation application 142 can generate newversions of object 130 based upon new versions of object 124 thatconform to the updated database schema that defines object 124.

Dependency management application 144 performs at least two specifictypes of dependency analysis: “impact analysis” and “lineage analysis”.An “impact analysis” involves identifying which objects will be impactedby a change to a particular object while “lineage analysis” involvesidentifying the set of objects on which an object depends. Thus, impactanalysis is sometimes referred to as a forward looking inquiry whilelineage analysis is sometimes referred to as a backwards lookinginquiry.

FIG. 2A is a block diagram depicting objects in an impact analysisexample, and FIG. 2B is a block diagram depicting objects in a lineageexample. Referring to FIG. 2A which depicts an impact analysis of objectA, object A is “affects” or “impacts” object B, which in turn impactsobjects C and D. Thus, a modification in the data or structure of objectA impacts object B, which in turn impacts objects C and D. Referring toFIG. 2B, which depicts a lineage analysis of object C, object C is inthe “lineage” of both object E and object A, thus, object C can “affect”both objects E and A. For example, if the objects depicted in FIG. 2Bare Object Oriented classes, and object B is a subclass of object C, andobjects E and A are subclasses of object B, then a change to object Caffects not only object B but also affects objects E and A.

Both FIGS. 2A and 2B have many objects in common, yet the set of objectsin the lineage and impact analysis examples is not identical. Graphingthe dependencies between objects is not simply a matter of connectingall objects with all the other objects regardless of the type ofrelationship between the objects or the type of analysis being performedon the objects. For example, objects A, B, and C are found in both thelineage and the impact analysis examples, however, object D is only inthe impact analysis example. Further, object E is only in the lineageexample.

The nature of objects between which dependencies exist may vary fromcontext to context. For example, in one context, the objects betweenwhich dependencies exist may include a fields or a column in a databasetable. In another context, the object may be a function that operates onother objects or performs some transformation on objects. For example,an object could be a join on two database tables or the results of thejoin. Similarly, what it means for one object to “depend on” anotherobject may vary from context to context.

In one embodiment, dependency management application 144 performs suchdependency analysis based upon a dependency model for a particular typeof dependency to be analyzed. Each dependency model includes a set ofdependency rules that govern dependencies between objects. Thedependency rules for a particular dependency model may be determinedusing a variety of approaches and the invention is not limited to anyparticular type of dependency rules or any particular approach fordetermining dependency rules. For example, dependency rules may bedetermined based upon a set of heuristics specified for a particularcontext.

The objects in a dependency model define the domain of the dependencymodel. Dependency analysis may be performed on any type of object,including objects stored in warehouse manager 110, as well as temporaryand intermediate objects that may be used by warehouse manager 110 andtransformation application 142. For example, transformation application142 may create and use intermediate objects, for example intermediatedatabase tables, to generate objects stored in data warehouse 108.Changes to the definitions of objects upon which the intermediateobjects depend may require a change to how the intermediate objects aredefined. Hence, dependency analysis may be used to identify intermediateobjects that are affected by changes made to the definitions of objects.

FIG. 3 is a block diagram that depicts an example structural dependencymodel 300 that can be implemented by dependency manager 144 according toan embodiment of the invention. The domain of structural dependencymodel 300 includes a transformation object 302, a process flow object304, a map object 306, a reusable map object 308, a cube object 310, adimension object 312, a table object 314 and a business intelligenceobject 316. The foregoing objects are example objects and other objectsmay be used depending upon the requirements of a particular application.The invention is not limited to any particular set or type of objects.

A set of connecting lines 318 depicts dependencies between theaforementioned objects. For each connecting line 318, the arrowheadidentifies an object that depends on another object attached to theother end of the connecting line. For example, both map object 306 andbusiness intelligence object 316 depend on table object 314.Accordingly, a change to table object 314 necessitates a change to mapobject 300 reusable map object 308 and business intelligence object 316.Note that process flow object 304 and reusable map object 308 each havea connecting line 320, 322, respectively, to themselves. Connectinglines 320, 322 depict that process flow object 304 may depend uponanother process flow object and that reusable map object 308 may dependupon another reusable map object.

Dependency rules may be implemented using different approaches,depending upon the requirements of a particular application. Forexample, a set of dependency rules may be represented by a set ofmathematical equations that define the dependency relationships. In thissituation, the set of mathematical equations for table object 314 mightbe:

-   -   T→M    -   T→ML, and    -   T→BI

where T=table object 314; M=map object 306; ML=reusable map object 308;BI=business intelligence object 316 and the symbol “→” in the equationsspecifies a dependency. Thus, in the present example, map object 306,reusable map object 308 and business intelligence object 316 eachdepends upon table object 314.

Based upon the dependency model, dependency data that reflectsdependencies defined in the model can be generated and stored in, forexample, dependency documents (not shown in FIG. 1) that are referencedby dependency management application 146. According to one embodiment ofthe invention, dependency documents conform to a common dependencyformat and data that reflects sets of dependencies is stored independency documents according to the common format. In one particularembodiment, the common dependency format is a hierarchical format.

The following is an example of how dependency information related to an“Object A” may be represented in a hierarchical format:

10 Object A 15   dependency { 20 object B 25   dependency { 30 object E35 } 40 object C 45 object D 50 }

Each line in the above example is depicted with a reference number tothe left for the purposes of facilitating the following explanation. Inthis example, line 10 includes the object name, Object A, for aparticular element. According to one embodiment, begin and end tagsdelineate the dependency information that is associated with the object.In this example, a “{” is a begin tag and a “}” is an end tag.Information between lines 15 and 50 is the dependency informationassociated with object A and the information between lines 25 and 35 isthe dependency information associated with object B.

According to one embodiment, tags are used to describe which objectsdepend (referred to hereinafter as “dependent objects”) on otherobjects. For example, the “dependency” tag at line 15 may indicate thatthe elements between lines 15 and 50 represent objects that depend onobject A. As depicted in the above example, objects B, C, and D dependon object A. Object E depends on Object B.

The “dependency” tag is only one example of a tag that may be used fordescribing the relationship between elements. According to oneembodiment, a “parent” tag may be used to indicate that a particularelement represents an object that is depended on by another object(referred to hereinafter as a “parent object”).

According to one embodiment, dependency documents are Extensible MarkupLanguage (XML) documents that conform to a particular XML schema that isspecifically designed for representing dependencies between objects.Appendix A is an example of such an XML schema.

XML allows for the creation of customized tags enabling the definitionof data. An XML schema can be used to define the customized tags todelineate elements in XML documents. For example, the XML schemadepicted in Appendix A defines tags for delineating, among other things,elements that correspond to objects in general, elements that correspondspecifically to parent objects and elements that correspond specificallyto dependent objects. Further, the XML schema depicted in appendix Adefines an element that corresponds to the name of objects, as will bedescribed in more detail.

At times, the dependency relationship between objects is recursive innature. For example, object A may depend on an object B, which maydepend on an object C, which depends back on object A. According to oneembodiment, the common dependency format provides a mechanism to avoidrepeating the data associated with object A the subsequent times thatobject A appears in the common dependency format.

For example, the XML schema depicted in appendix A provides an attributename “complete”, which is used for this mechanism. The attribute name“complete” may be set to a Boolean value of “yes” or “no” indicatingwhether this is the first time an element describing an object'sdependencies appears in an XML document or a subsequent occurrence of anelement describing the same object's dependencies. The first time anelement describing an object's dependencies appears in an XML document,the object's dependency information is fully described and “complete” isset to “yes”. The subsequent times an element describing the same objectappears in the XML document, “complete” is set to “no” indicating thatthe subsequent appearance should obtain dependency information about theobject from the first appearance.

Appendix B illustrates a dependency document that may be used to performimpact analysis, according to one embodiment of the invention. Thedependency document, as illustrated in appendix B, is an XML documentthat conforms to the XML schema illustrated in appendix A. Referencenumbers, which are used to indicate the XML statements discussed herein,are in the right margin of appendix B. At reference number 30 a, the“LIAType=‘impact’” tag indicates that this XML document is for impactanalysis.

As depicted in appendix B, <OBJECT> and </OBJECT> are respectively beginand end tags used for delineating elements which comprise dependencyinformation associated with objects. <DEPENDENCY> and </DEPENDENCY> arerespectively begin and end tags used for delineating elements whichcomprise dependency information associated with dependent objects.<PARENT> and </PARENT>> are respectively begin and end tags used fordelineating elements which comprise dependency information associatedwith parents objects. <NAME> and </NAME> are respectively begin and endtags used for delineating the name of objects in general.

At reference numbers 34 a and 35 a, appendix B depicts the names of adependent objects “Products” and “Expr”. At reference number 33 a,appendix B depicts the name of the object, “Union_(—)1”, that “Product”and “Expr” depend on. At reference number 33 b, appendix B depicts thename, “Map_products” of the parent object of “Union_(—)1”.

As already stated, at times the relationship between objects isrecursive in nature. The attribute name “complete” is one example of amechanism to avoid repeating dependency information for objects. Forexample, the first occurrence of dependency information for the“Products” object in the XML document is depicted at reference number 34a in appendix B. The second occurrence of dependency information for the“Products” object is depicted at reference number 38 a in appendix B. A“complete=yes” tag, at reference 37 c, is associated with the firstoccurrence and a “complete=no” tag, at reference 38 c, is associatedwith the second occurrence. Thus, the second occurrence of “Products”may obtain dependency information from the first occurrence.

In a similar manner, appendix C illustrates a dependency document foranalyzing the lineage of objects, according to one embodiment. Thedependency document, as illustrated in appendix C, is an XML documentthat also conforms to the XML schema illustrated in appendix A.

According to one embodiment, data for different sets of dependencies arestored in different dependency documents. For example, the data in thedependency document illustrated in appendix B pertains to the impactanalysis of a plurality of objects, whereas, the data in the dependencydocument illustrated in appendix C pertains to the lineage of aplurality of objects.

According to one embodiment, all sets of dependencies are stored independency documents that conform to a common dependency format. Forexample, the data in the dependency document illustrated in appendix Band the data in the dependency document illustrated in appendix C bothconform to the XML schema illustrated in appendix A.

Further details of techniques that can be used by dependency managementapplication 144 to manage dependencies between objects in system 100 aredescribed in U.S. patent application Ser. No. 10/325,784, filed on Dec.18, 2002 and entitled “Analyzing the Dependencies Between Objects in aSystem”, which is hereby incorporated by reference in its entirety.

Referring back to FIG. 1, task management application 146 is configuredto determine the tasks required to resolve the impact of a change in thedefinition or metadata of a warehouse object in system 100 and reconcilethe change within data warehouse 108. Task management application 146can be configured to detect such changes and automatically initiate thereconciliation process or it can be configured to initiate metadatareconciliation in response to a request from a user. In performing thereconciliation process, task management application 146 is configured toinvoke dependency management application 144 to perform impact andlineage analysis on an identified change and invoke transformationmanager 142 to perform necessary transformations.

In one embodiment task management application 146 is further configuredto enable users to perform project planning by previewing andquantifying any potential impact to data warehouse 108 by an impendingchange to the design of the warehouse or the definition of an object inwarehouse 108 or one of the source databases 102, 104, 106. Also, someembodiments of task management application 146 provide a graphical userinterface (GUI) that displays dependency graphs of impacted datawarehouse objects based upon the dependency model and its constituentdependency rules and displays a list of all tasks potentially involvedin keeping the data warehouse in sync when changes are introduced. Thisallows users to gauge the size of a reconciliation project along withthe amount of effort and time that may be needed to complete the projectand may be used to track. The GUI can also be used to track the statusof completion of the various tasks that are suggested by task managementapplication 146 as necessary or appropriate for warehousereconciliation. In such a GUI, objects in system 100 may be representedgraphically by a user interface (UI) object. Various menus, shapes,colors and even sounds may be used to differentiate objects and theirrelationships to other objects and the invention is not limited to anyparticular implementation.

According to one embodiment of the invention, in response to a changemade to any of the database schemas used by source databases 102, 104,106, task management application 146 determines the tasks required toreconcile the entire data warehouse to account for the change. Thisinvolves performing impact and lineage analysis on the change, and basedon the results of that analysis, analyzing the impact and lineage graphsto determine a list of tasks in a particular order required to reconcilethe warehouse. During the course of completing the tasks, taskmanagement application 146 may allow a user to provide input to modifythe sequence of suggested tasks to perform the reconciliation process,add additional tasks to the process or skip selected tasks. Based onsuch user input, task management application 146 can recalculate thetasks required to reconcile the data warehouse and provide an updatedtask list for user review.

FIG. 4( a) is a flow chart that depicts the metadata reconciliationprocess according to one embodiment of the invention. As shown in FIG.4( a), the reconciliation process is initiated when either a userselects an object for the reconciliation process in step 400 a or when achange is detected in an object in step 400 b. An object may be selectedin step 400 a, for example, when a user wants to review and quantify thepotential impact on the data warehouse based on a change to the selectedobject. Similarly, a change in an object may be detected in step 400 bwhen, for example, a schema of an object in one of the source databasesis changed.

Each of steps 400 a and 400 b result in the selected or changed objectbeing added to a list of edited objects (referred to as an “edit list”)in step 402 that is maintained by task management application 146. Instep 404, task management application 146 then determines if the objectplaced in the edit list impacts other objects. For convenience, theobject initially placed in the edit list in response to step 400 a or400 b is referred to below as the “current object”. Step 404 starts arecursive process in which task management application 146 will identifytasks to be performed to reconcile changes to the current object andidentify additional objects that should be added to the edit list. Thenewly identified objects are not processed in step 404 et seq. untiltask management application 146 completes processing of the currentobject.

FIG. 4( b) is a flow chart that depicts the impact analysis process thatis undertaken in step 404 according to one embodiment of the invention.As an initial step, task management application 146 begins an analysisof the forward impact the change to the current object will have onother objects in system 100 (step 406). This is done by referencing thedependency documents dependency management application 144 created inresponse to the particular dependency model and dependency rules thatgovern the relationship between objects in data warehouse 108 and theobjects in the data sources 102, 104, 106.

In the embodiment shown in FIG. 4( b), task management application 146graphically displays the impact analysis of step 406 using a GUI inorder to allow the user to easily understand the relationship betweenpotentially affected objects and to allow the user to more readilyappreciate the impact the change to the given object creates. This isdone, for example, by generating and displaying the results of theimpact analysis in step 406 for the current object as a visual graph.Embodiments of the invention may use a variety of graphicalrepresentations to display the impact or may report the impact to theuser using text rather than a graphical representation.

Next, task management application 146 initiates a process that analyzesthe potential backwards impact the change to the current object may haveon other objects in system 100. This analysis starts by selecting theroot object in the impact analysis (which is the current object—theobject added to the edit list in step 402) for further analysis (step408). In step 410, task management application 146 then determines ifthe selected object may be impacted by other objects in system 100 byinitiating a lineage analysis of the selected object. For reference, theobject selected in step 408 is referred to below as the “selectedobject”.

FIG. 4( c) is a flow chart that depicts the lineage analysis processundertaken in step 410 as well as the process that allows a user toimplement changes to an object in the data warehouse. As shown in FIG.4( c), the selected object is added to a “fix object” list (step 412).Then, in step 414 a lineage analysis is performed on the object. As withthe impact analysis, the lineage analysis is done by referencing thedependency documents task management application 144 created in responseto the dependency model and dependency rules that govern therelationship between objects in data warehousing system 100.

In the embodiment shown in FIG. 4( c), task management application 146graphically displays the lineage analysis using a GUI in order to allowthe user to easily understand the relationship between potentiallyaffected objects and to allow the user to more readily appreciate thepotential affect the change to the given object creates. Embodiments ofthe invention may use a variety of graphical representations to displaythe lineage dependencies or may report the lineage dependencies to theuser using text rather than a graphical representation.

In steps 416 and 418, task management application 146 presents theselected object to the user to determine if the object needs editing. Ifthe user determines that editing is appropriate (e.g., the user wants tochange the schema of the object to take into account a change made toanother object in system 100), the object is presented to the user sothat it can be edited (step 420) using an appropriate editing tool. Thestep of editing an object can include any resolution action on theobject. For example, editing the object may entail adding additional orremoving existing columns to a database table object, adding or deletingobjects used in a mapping object and altering a transformation performedby a transformation object among others. After such editing iscompleted, task management application 146 marks the object as “edited”and removes the object from the fix object list (step 422). In someembodiments of the invention, task management application 146automatically proposes edits to the object and presents the proposedchanges to the user in a manner that allows the user to either confirmthat the proposed change is desired (e.g., by selecting “yes” inresponse to an appropriate prompt), reject the proposed change or editthe proposed change. Task management application 146 then implements theuser's selection.

Next, task management application 146 determines if the edited objectneeds to be reconciled with other objects in its dependency chain (step424), which are sometimes referred to below as linked objects.Alternatively, if the object in the fix object list did not requireediting in step 418, step 424 is performed on the object immediately,skipping steps 420 and 422. Reconciling an object in this sense mayinclude editing one or more of the objects in the selected objectsdependency chain to account for the changes to the selected object. Forexample, consider a first object that includes a copy of a second objector uses a second object in a function where the second object is theselected object. The first object is linked to the second object and canthus be referred to as a “linked object”. When a change is made to thesecond object, copied into or used in the first object, the first objectneeds to be synchronized to include the changed second object. In oneimplementation, the need for reconciling the selected object withanother object in its dependency chain is graphically depicted by abroken or dashed line between the objects in a graph depicting thedependency chain (a broken link). A solid line between the selectedobject and another object represents that reconciliation between theobjects is not necessary (the link is not broken).

In one embodiment, each object in system 100 has a unique signature thatis represented by a character string. If a first object that includes asecond object that was recently edited is synchronized with the secondobject, i.e., includes an accurate duplication of the second object, thesignature of the second object within the first object will match thesignature of the second object. Thus, in such embodiments, step 424determines whether or not objects need to be reconciled by comparingsignature strings. If the strings match, synchronization is notnecessary, but if the strings do not match, the object in the fix objectlist is different in some way from the object in the dependency chain.

If task management application 146 determines that synchronization witha linked object is necessary, it checks to see if the linked object isalready in the edit list in step 426. If the linked object was alreadyin the edit list, the object is flagged in step 428 as such so that theuser can resolve potential conflicts between edits. The objects are thenreconciled in step 430 by allowing the user to change the copy of objectin the dependency chain to reflect the change(s) made to the selectedobject. If the linked object was not in the edit list, step 428 isskipped. After reconciling the selected object with the linked object,task management application 146 then marks the reconciled object (thelinked object) as edited (step 432) and adds the reconciled object tothe fix object list (step 434).

Next, task management application 146 checks if the selected objectneeds to be synchronized with another of its linked objects in system100. If further synchronization is necessary (decision in step 436),appropriate steps in steps 424-434 are repeated with each remaininglinked object that requires synchronization. If further synchronizationis not necessary (decision in step 436), the task management applicationdetermines, in step 438, whether or not there are additional objects inthe fix object list. If additional objects are in the list, appropriatesteps from steps 416-438 are repeated until all the objects that are inthe lineage of the selected object have been addressed.

When no more additional objects are in the fixed object list, all theobjects that were edited in step 420 and marked as edited in step 422are added to the to edit list in step 440 and the lineage analysis ofthe selected object is complete (step 442).

Referring back to FIG. 4( b), task management application 146 thensynchronizes all outbound links in step 444. If there are additionalobjects in the impact graph generated during step 406, each remainingobject is processed sequentially as the selected object according toappropriate steps in the sequence of steps 408-444 until all the objectsin the impact graph have been addressed (decision of step 446). Once thelast object in the edit list has been handled, the analysis andreconciliation of the objects in the impact graph of the current objectis complete (step 448) and the current object is removed from the “toedit” list.

Referring back to FIG. 4( a), task management application 146 nextdetermines if there are any remaining objects in the “to edit” list(step 450). If there are, impact and lineage analysis are performed forthe next object in the list, which becomes the new “current object”, asdescribed above with respect to steps 404-448. When all the objectsadded to the to edit list during the reconciliation process have beenaddressed, the reconciliation process is finished (steps 452, 454).

The approaches for reconciling a data warehouse described above are notlimited to the specific steps or the specific order of steps as depictedin with respect to FIGS. 4( a)-(c). In some embodiments, some of thesteps may not be performed, additional steps may be performed and/or theorder may be changed, depending upon the requirements of a particularapplication. Also, embodiments of the invention allow a user to skipselected steps and proceed on to perform other tasks recommended by taskmanagement application 146. The user may then return to the skippedsteps (e.g., synchronizing a particular object) and complete theappropriate tasks required to reconcile objects associated with theskipped tasks at a later time. Additionally, the steps depicted in FIGS.4( a)-(c) are not mutually exclusive and may be combined.

In one embodiment of the invention, the tasks and actions identified andperformed to reconcile the data warehouse are memorialized in a scriptthat can be used to reconcile the data warehouse when the process ofFIGS. 4( a)-(c) is complete. In this embodiment, the editing and otheractions taken by the user in the steps associated with FIGS. 4( a)-(c)do not actually change the underlying objects in system 100. Instead, ascript is generated that contains commands which perform the actions,including edits and other changes to objects in system 100, on the datawarehouse. The script is completed at the end of the reconciliationprocess in step 452 and can then be run against the metadata repositoryto invoke the changes at a later date and time selected by a user instep 454. In one implementation the synchronization script used is theOracle MetaBase (OMB) scripting language available with the OracleWarehouse Builder.

In order to better understand and appreciate the invention, an exampleof a reconciliation process according to one embodiment of the inventionis described below in conjunction with FIGS. 5 and 6( a)-(m), where FIG.5 depicts the dependencies in a portion of an example data warehouse andFIGS. 6( a)-(m) are screen shots depicting a GUI and sequence of eventsaccording to one embodiment of the invention. It is important toappreciate that the example is illustrative in nature only.

As shown in FIG. 5, the exemplary data warehouse includes a sourcedatabase table object 500 (Customers) and several database warehouseobjects including a database table object 502 (CustVita), a functionobject 504 (GetFullName); and a mapping object 506 (CustInfo) that mapsfields in object 500 to objects 502, 504 and 506. Mapping object 506depends on each of objects 502, 504 and 506.

In FIG. 6( a), a list of warehouse objects are shown from a warehousebuilt in Oracle Warehouse Builder that includes the objects shown inFIG. 5 and others. The warehouse includes a source database entitledBUSINESS SOURCES that includes two source table objects: Customers table(object 508) and Orders table. In this example, a user selects theCustomers table, by, for example, pointing to it with a mouse andclicking on the left mouse button, and then selects to analyze theimpact a change to the Customer table would have on the data warehouseusing an appropriate menu selection. Such a selection may represent, forexample, step 400 a of FIG. 4( a).

FIG. 6( b) shows three different windows that are part of the GUI thatassists the user in the reconciliation process. Window 600 graphicallydepicts the dependencies between the selected object and other objectsin the data warehouse. In window 600 user interface (UI) objects areshown that are graphical representations of corresponding objects in thedata warehouse system. Thus, UI object 606 represents the Customerstable object in the Business Sources database, while UI object 608represents the map object CustInfo. In window 600, connecting linesbetween objects indicate dependencies between the objects correspondingto the UI objects shown in the window. Thus, as shown, the CustInfomapping object depends on the Customers table object so that a change inthe Customers table object will require a change to the CustInfo object.

Window 602 displays output of the task management application 146 in theform of tasks suggested by application 146 that should be accomplishedto resolve the impact of the selected object on the warehouse. As shownin FIG. 6( b), window 602 includes rows of tasks 610 as well as statusinformation (completed or skipped) in status columns 612 and 614.Finally, window 604 lists specific objects that may need to be updatedfor a particular task listed in window 602. The information shown inFIG. 6( b) represents steps 402, 204 and 406 of FIGS. 4( a) and (b).

In FIG. 6( c), the task management application has determined that inorder to resolve the impact of the Customers table, three separate tasksneed to be performed: resolve the lineage of Customers table (task 609a), synchronize the Customers table to the CustInfo mapping (task 609 b)and resolve the lineage of the CustInfo mapping (task 609 c). Thecalculation of these tasks corresponds to steps 408, 410, 412 and 414(task 609 a), step 450 (step 609 b) and steps 452, 408, 410 (task 609 c)of FIGS. 4( b) and (c). Note, in this particular example, the Customersobject does not have any lineage, thus, the generation of the lineageanalysis in step 414 (task 609 a) does not add any additional objects towindow 600.

FIG. 6( d) shows that the first task application 146 suggests to beperformed is to resolve the lineage of the Customers table (task 609 a).In order to do this, task management application 146 suggests editingthe table Customers first as indicated by row 610 and determined bysteps 416 and 418 of FIG. 4( b). Note that window 600 shows that tableCustomers is the object currently being analyzed by placing a box aroundUI 506 that represents the Customers table. FIG. 6( e), which can beused to facilitate table editing per step 420, shows that two newcolumns, 612 a and 612 b, were added to Customers table 612 to track acustomer's middle initial and country code, respectively.

Referring now to FIG. 6( f), after editing Customer table 612, thestatus of task 610 is marked as completed in window 602 (step 422 inFIG. 4( c)) and the task assistant suggests synchronizing the tableobject Customers and the map object CustInfo that is using the tableCustomers (task 609 b). Synchronization window 604 shows that twocolumns, Middle_Initial and Country_Code, need to be created asindicated by rows 614 a and 614 b in window 604. The steps discussedwith respect to FIG. 6( f), represent steps 422, 424, 436, 438, 440, 442and 450. Steps 426-434 were skipped because there is no lineage for theCustInfo map object and thus there were no links on the lineage graphthat needed to be reconciled. Step 442 updates the status of the“Resolve lineage of Customers” line in window 602 to “completed” andstep 450 suggests synchronizing the Customers table to the CustInfomapping that is using the table Customers.

As shown in FIG. 6( g), which corresponds to steps 452, 408 and 410 inFIG. 4( b), after synchronizing the Customers table to the CustInfomapping, the link between the two objects in window 600 is marked with agreen check to indicate the objects have been synchronized so that theuser can better appreciate which tasks have been accomplished and whichstill need to be done. Task management application 146 then suggestsresolving the lineage of the CustInfo table, which represents the objectafter the root object (Customers) in the impact graph generated in step406. This expands the graph shown in window 600 to include two newlineage objects of CustInfo: a table Cust_Vita and a functionGetFullName, represented by UI objects 616 and 618, respectively. Notethat FIG. 5 shows that both Cust_Vita table object and GetFullNamefunction object are in the lineage of map object CustInfo.

In order to resolve the lineage of the CustInfo object, task managementapplication 146 suggests editing the object CustInfo itself in FIG. 6(h), window 602, row 620. This corresponds to steps 416 and 418 in FIG.4( b). Next, the editor is invoked as shown in FIG. 6( i). The CustomersUI object includes new fields Middle_Initial (field 612 a) andCountry_Code (field 612 b) that were added to the table object Customers612 after the map object Cust Info was last synchronized with customerswere last synchronized with Customers. In this example, the user decidesto use the editor to add the Middle_Initial field to GetFullNamefunction 620 as shown in FIG. 6( j). Editing the objects as shown inFIGS. 6( i) and (j) corresponds to Step 420 in FIG. 4( b).

Referring now to FIG. 6( k), window 602 now shows that the editing ofthe CustInfo object has been completed in row 622. Because thecompletion of task 622 resulted in editing the function GetFullName,task management application 146 now suggests that the objects CustInfoand GetFullName be synchronized as indicated in row 624. Thiscorresponds to steps 422, 424, 426 and 430 of FIG. 4( b). Thesynchronization plan is shown in FIG. 6( l) and includes forming a newparameter Middle_Initial (field 626) in the function GetFullName. Thiscorresponds to step 430 in FIG. 4( b).

After CustInfo and GetFullName have been synchronized, the link betweenthe objects is marked with a green check as shown in FIG. 6( m) as wasdone for the link between Customers and CustInfo in FIG. 6( g). Next,task management application 146 suggests resolving the impact that maybe caused by the change in the GetFullName function and therefore addsthe action “resolve impact GetFullName” to row 428 at the end of thetask list. This corresponds to steps 432, 434, 436, 438, 440, 442, 460and 462 in FIGS. 4( a)-(c). Because there is no potential further impactfrom the GetFullName function, the task to resolve the impact is markedas complete and all steps to resolve the impact from the change to theCustomers table object have been completed. Thus, the process ofreconciling the warehouse due to the change in the table object isfinished.

Although embodiments of the invention have been described herein in thecontext of performing metadata reconciliation in data warehousingapplications, the invention is not limited to this context and isapplicable to any context where metadata reconciliation is performed.For example, the approach described herein is applicable to performingmetadata reconciliation in database system environments.

FIG. 7 is a block diagram that illustrates a computer system 700 uponwhich an embodiment of the invention may be implemented. Computer system700 includes a bus 702 or other communication mechanism forcommunicating information, and a processor 704 coupled with bus 702 forprocessing information. Computer system 700 also includes a main memory706, such as a random access memory (RAM) or other dynamic storagedevice, coupled to bus 702 for storing information and instructions tobe executed by processor 704. Main memory 706 also may be used forstoring temporary variables or other intermediate information duringexecution of instructions to be executed by processor 704. Computersystem 700 further includes a read only memory (ROM) 708 or other staticstorage device coupled to bus 702 for storing static information andinstructions for processor 704. A storage device 710, such as a magneticdisk or optical disk, is provided and coupled to bus 702 for storinginformation and instructions.

Computer system 700 may be coupled via bus 702 to a display 712, such asa cathode ray tube (CRT), for displaying information to a computer user.An input device 714, including alphanumeric and other keys, is coupledto bus 702 for communicating information and command selections toprocessor 704. Another type of user input device is cursor control 716,such as a mouse, a trackball, or cursor direction keys for communicatingdirection information and command selections to processor 704 and forcontrolling cursor movement on display 712. This input device typicallyhas two degrees of freedom in two axes, a first axis (e.g., x) and asecond axis (e.g., y), that allows the device to specify positions in aplane.

The invention is related to the use of computer system 700 forreconciling a data warehouse. According to one embodiment of theinvention, data warehouse reconciliation is provided by computer system700 in response to processor 704 executing one or more sequences of oneor more instructions contained in main memory 706. Such instructions maybe read into main memory 706 from another computer-readable medium, suchas storage device 710. Execution of the sequences of instructionscontained in main memory 706 causes processor 704 to perform the processsteps described herein. One or more processors in a multi-processingarrangement may also be employed to execute the sequences ofinstructions contained in main memory 706. In alternative embodiments,hard-wired circuitry may be used in place of or in combination withsoftware instructions to implement the invention. Thus, embodiments ofthe invention are not limited to any specific combination of hardwarecircuitry and software.

The term “computer-readable medium” as used herein refers to any mediumthat participates in providing instructions to processor 704 forexecution. Such a medium may take many forms, including but not limitedto, non-volatile media, volatile media, and transmission media.Non-volatile media includes, for example, optical or magnetic disks,such as storage device 710. Volatile media includes dynamic memory, suchas main memory 706. Transmission media includes coaxial cables, copperwire and fiber optics, including the wires that comprise bus 702.Transmission media can also take the form of acoustic or light waves,such as those generated during radio wave and infrared datacommunications.

Common forms of computer-readable media include, for example, a floppydisk, a flexible disk, hard disk, magnetic tape, or any other magneticmedium, a CD-ROM, any other optical medium, punch cards, paper tape, anyother physical medium with patterns of holes, a RAM, a PROM, and EPROM,a FLASH-EPROM, any other memory chip or cartridge, a carrier wave asdescribed hereinafter, or any other medium from which a computer canread.

Various forms of computer readable media may be involved in carrying oneor more sequences of one or more instructions to processor 704 forexecution. For example, the instructions may initially be carried on amagnetic disk of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computer system 700 canreceive the data on the telephone line and use an infrared transmitterto convert the data to an infrared signal. An infrared detector coupledto bus 702 can receive the data carried in the infrared signal and placethe data on bus 702. Bus 702 carries the data to main memory 706, fromwhich processor 704 retrieves and executes the instructions. Theinstructions received by main memory 706 may optionally be stored onstorage device 710 either before or after execution by processor 704.

Computer system 700 also includes a communication interface 718 coupledto bus 702. Communication interface 718 provides a two-way datacommunication coupling to a network link 720 that is connected to alocal network 722. For example, communication interface 718 may be anintegrated services digital network (ISDN) card or a modem to provide adata communication connection to a corresponding type of telephone line.As another example, communication interface 718 may be a local areanetwork (LAN) card to provide a data communication connection to acompatible LAN. Wireless links may also be implemented. In any suchimplementation, communication interface 718 sends and receiveselectrical, electromagnetic or optical signals that carry digital datastreams representing various types of information.

Network link 720 typically provides data communication through one ormore networks to other data devices. For example, network link 720 mayprovide a connection through local network 722 to a host computer 724 orto data equipment operated by an Internet Service Provider (ISP) 726.ISP 726 in turn provides data communication services through theworldwide packet data communication network now commonly referred to asthe “Internet” 728. Local network 722 and Internet 728 both useelectrical, electromagnetic or optical signals that carry digital datastreams. The signals through the various networks and the signals onnetwork link 720 and through communication interface 718, which carrythe digital data to and from computer system 700, are exemplary forms ofcarrier waves transporting the information.

Computer system 700 can send messages and receive data, includingprogram code, through the network(s), network link 720 and communicationinterface 718. In the Internet example, a server 730 might transmit arequested code for an application program through Internet 728, ISP 726,local network 722 and communication interface 718. In accordance withthe invention, one such downloaded application provides for datawarehouse reconciliation as described herein.

The received code may be executed by processor 704 as it is received,and/or stored in storage device 710, or other non-volatile storage forlater execution. In this manner, computer system 700 may obtainapplication code in the form of a carrier wave.

Having fully described several embodiments of the present invention,other equivalent or alternative methods of practicing the presentinvention will be apparent to those skilled in the art. For example,warehouse manager 110 may perform more, fewer and/or different functionsthan described herein and such functions may be implemented by awarehouse manager 110 using more, fewer and/or different components thanthose shown in FIG. 1. Also, while the reconciliation process depictedin FIGS. 4( a)-(c) initiated impact and lineage analysis in a particularorder, the order that such an analysis is initiated may be reversed inother embodiments. These and other embodiments as well as alternativesand equivalents to the invention will be recognizable to those of skillin the art after reading the description of the present invention. Thescope of the invention should not, therefore, be determined solely byreference to the above description, but instead should be determinedwith reference to the appended claims along with their full scope ofequivalents and alternatives.

APPENDIX A <?xml version=“1.0” encoding=“UTF-8”?> <xsd:schemaxmlns:xsd=“http://www.w3.org/2000/10/XMLSchema”      elementFormDefault=“qualified”> <!--.....................MainDocumentation..........................--> <xsd:annotation>   <xsd:documentation xml:lang=“en”>    Lineage and Impact AnalysisSchema    Copyright 2001 Oracle Corp. All rights reserved.   </xsd:documentation> </xsd:annotation><!--.....................AnalysisType................................--> <xsd:element name=“TREE”>   <xsd:complexType>       <xsd:sequence minOccurs=“1” maxOccurs=“1”>         <xsd:element name=“DEPENDENCY” type=“DependencyType”/>      </xsd:sequence>   </xsd:complexType>    <xsd:attributename=“CreateDate” type=“xsd:dateTime” use=“optional”/>    <xsd:attributename=“UpdateDate” type=“xsd:dateTime” use=“optional”/>    <xsd:attribute name=“LIAType” type=“LIAType” use=“required”/></xsd:element> <!--...................DependencyType................................--> <xsd:complexTypename=“DependencyType”>    <xsd:sequence minOccurs=“0”maxOccurs=“unbounded”>       <xsd:element name=“OBJECT”type=“ObjectType”/>    </xsd:sequence>     <!-- dependency descriptionis for expression usage -->    <xsd:attribute name=“Description”type=“xsd:string” use=“required”/>    <xsd:attribute name=“Impact”type=“ImpactType” use=“required”/>    <xsd:attribute name=‘Complete”type=“xsd:boolean” use=“required”/> </xsd:complexType><!--...................Object ReferenceType..........................--> <xsd:complexTypename=“ObjectReferenceType”>    <xsd:sequence minOccurs=“1”maxOccurs=“1”>       <xsd:sequence minOccurs=“1” maxOccurs=“1”>         <xsd:element name=“NAME” type=“xsd:string”/>         <xsd:element name=“TYPE” type=“xsd:string”/>         <xsd:element name=“ID” type=“xsd:ID”/>          <xsd:elementname=“DESCRIPTION” type=“xsd:string”/>       </xsd:sequence>      <xsd:sequence minOccurs=“0” maxOccurs=“1”>          <xsd:sequenceminOccurs=“0” maxOccurs=“1”>             <xsd:element name=“RPTLINK”type=“xsd:uri- reference”/>          </xsd:sequence>         <xsd:sequence minOccurs=“0” maxOccurs=“1”>            <xsd:element name=“NAVLINK” type=“xsd:uri- reference”/>         </xsd:sequence>          <xsd:sequence minOccurs=“0”maxOccurs=“1”>             <xsd:element name=“IMGLINK” type=“xsd:uri-reference”/>          </xsd:sequence>       </xsd:sequence>   </xsd:sequence> </xsd:complexType> <!--...................ObjectType....................................--> <xsd:complexTypename=“ObjectType”>    <complexContent>       <extensionbase=“ObjectReferenceType”>          <xsd:sequence minOccurs=“1”maxOccurs=“1”>             <xsd:sequence minOccurs=“1” maxOccurs=“1”>               <xsd:element name=“PARENT” type=“ParentType”/>            </xsd:sequence>             <xsd:sequence minOccurs=“0”maxOccurs=“1”>                <xsd:element name=“CONNECTORLIST”type=“ConnectorListType”/>             </xsd:sequence>            <xsd:sequence minOccurs=“0” maxOccurs=“unbounded”>               <xsd:element name=“DEPENDENCY” type=“DependencyType”/>            </xsd:sequence>          </xsd:sequence>       </extension>   </complexContent> </xsd:complexType> <!--...................ImpactType...................................--> <xsd:simpleTypename=“ImpactType”>    <xsd:restriction base=“xsd:NMTOKEN”>      <xsd:enumeration value=“NONE”/>       <xsd:enumerationvalue=“CREATE”/>       <xsd:enumeration value=“DELETE”/>      <xsd:enumeration value=“UPDATE”/>       <xsd:enumerationvalue=“UNKNOWN”/>    </xsd:restriction> </xsd:simpleType><!--...................LIA Type...................................--><xsd:simpleType name=“LIAType”>    <xsd:restriction base=“xsd:NMTOKEN”>      <xsd:enumeration value=“lineage”/>       <xsd:enumerationvalue=“impact”/>    </xsd:restriction> </xsd:simpleType><!--...................Parent Type...................................--><xsd:complexType name=“ParentType”>    <complexContent>       <extensionbase=“ObjectReferenceType”/>    </complexContent> </xsd:complexType><!--...................ConnectorList Type............................--><xsd:complexType name=“ConnectorListType”>    <xsd:sequenceminOccurs=“1” maxOccurs=“unbounded”>       <xsd:element name=“CONNECTOR”type=“ConnectorType”/>    </xsd:sequence> </xsd:complexType><!--...................Connector Type................................--><xsd:complexType name=“ConnectorType”>    <xsd:sequence minOccurs=“1”maxOccurs=“1”>       <xsd:element name=“FROM” type=“FromType”/>      <xsd:element name=“TO” type=“ToType”/>    </xsd:sequence></xsd:complexType> <!--..................ConnectorBase.................................--> <xsd:complexTypename=“ConnectorBaseReference”>    <xsd:sequence>       <xsd:attributename=“ID” type=“xsd:ID” use=“required”/>       <xsd:attributename=“NAME” type=“xsd:string” use=“required”/>       <xsd:attributename=“TYPE” type=“xsd:string” use=“required”/>    </xsd:sequence></xsd:complexType> <!--..................FromType......................................--> <xsd:complexTypename=“FromType”>    <complexContent>       <extensionbase=“ConnectorBaseReference”/>    </complexContent> </xsd:complexType><!--.................To Type.........................................--><xsd:complexType name=“TOType”>    <complexContent>       <extensionbase=“ConnectorBaseReference”/>    </complexContent> </xsd:complexType></xsd:schema>

APPENDIX B <?xml version=“1.0” encoding=“UTF-8” ?> - <TREExmlns:xsi=“http:/www.w3.org/2000/10/XMLSchema-instance”  xsi:noNamespaceSchemaLocation=“lia.xsd” CreateDate=“” UpdateDate=“24-  OCT-02” Description=“unknown” LIAType=“impact”>    30A  - <DEPENDENCYImpact=“NONE” Description=“NONE” Complete=“YES”>   - <OBJECT>    <NAME>ORACLE_CATEGORIES</NAME> 31A     <TYPE>TABLE</TYPE>    <ID>13578</ID>    <DESCRIPTION>ORACLE_ORDER_SYSTEM.ORACLE_CATEGORIES</DESCRIPTION>   - <PARENT>      <NAME>ORACLE_ORDER_SYSTEM</NAME>     <TYPE>PACKAGED_MODULE</TYPE>      <ID>13567</ID>     <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>    </PARENT>    - <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES”>     - <OBJECT>       <NAME>ORACLE_CATEGORIES</NAME>      <TYPE>TABLE</TYPE>       <ID>13578</ID>      <DESCRIPTION>ORACLE_ORDER_SYSTEM.ORACLE_CATEGORIES<       /DESCRIPTION>      - <PARENT>       <NAME>ORACLE_ORDER_SYSTEM</NAME>       <TYPE>PACKAGED_MODULE</TYPE>        <ID>13567</ID>       <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>      </PARENT>       <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“NO”/>      </OBJECT>     </DEPENDENCY>    - <DEPENDENCYImpact=“NONE” Description=“NONE” Complete=“YES”>     - <OBJECT>      <NAME>UNION_1</NAME> 33A       <TYPE>SetOperation</TYPE>      <ID>14972</ID>      <DESCRIPTION>MAP_PRODUCTS.UNION_1</DESCRIPTION>      - <PARENT>       <NAME>MAP_PRODUCTS</NAME> 33B        <TYPE>TRANSFORM_MAP</TYPE>       <ID>14971</ID>       <DESCRIPTION>DRUGDEPO_WH.MAP_PRODUCTS</DESCRIPTION>      </PARENT>       <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES”/>      - <DEPENDENCY Impact=“NONE” Description=“NONE”       Complete=“YES”>       - <OBJECT>         <NAME>PRODUCTS</NAME>34A         <TYPE>DIMENSION</TYPE>         <ID>13405</ID>        <DESCRIPTION>DRUGWAREHOUSE.PRODUCTS</DESCRIPTION>        -<PARENT>          <NAME>DRUGWAREHOUSE</NAME> 37A         <TYPE>DATAWAREHOUSE</TYPE>          <ID>13119</ID>         <DESCRIPTION>DRUGDEPO_WH.DRUGWAREHOUSE</DESCRIPTION>        </PARENT>         <DEPENDENCY Impact=“NONE” Description=“NONE”         Complete=“YES” /> 37C         <DEPENDENCY Impact=“NONE”Description=“NONE”          Complete=“YES”>            <OBJECT>             <NAME>SALES</NAME>              <TYPE>CUBE</TYPE>             <ID>16535</ID>             <DESCRIPTION>DRUGWAREHOUSE.SALES</DESCRIPTION>             <PARENT>                <NAME>DRUGWAREHOUSE</NAME>               <TYPE>DATAWAREHOUSE</TYPE>                <ID>16122</ID>               <DESCRIPTION>DRUGDEPO_WH.DRUGWAREHOUSE               </DESCRIPTION>              </PARENT>             <DEPENDENCY Impact=“NONE”              Description=“NONE”Complete=“YES”/>            </OBJECT>         </DEPENDENCY>       </OBJECT>       </DEPENDENCY>      - <DEPENDENCY Impact=“NONE”Description=“NONE”        Complete=“YES”>       - <OBJECT>        <NAME>EXPR</NAME> 35A         <TYPE>ExpressionTransform</TYPE>        <ID>15149</ID>        <DESCRIPTION>MAP_PRODUCTS.EXPR</DESCRIPTION>        - <PARENT>         <NAME>MAP_PRODUCTS</NAME>          <TYPE>TRANSFORM_MAP</TYPE>         <ID>14971</ID>         <DESCRIPTION>DRUGDEPO_WH.MAP_PRODUCTS</DESCRIPTION>        </PARENT>         <DEPENDENCY Impact=“NONE” Description=“NONE”         Complete=“YES” />        - <DEPENDENCY Impact=“NONE”Description=“NONE”          Complete=“YES”>         - <OBJECT>          <NAME>PRODUCTS</NAME> 38A           <TYPE>DIMENSION</TYPE>          <ID>13405</ID>          <DESCRIPTION>DRUGWAREHOUSE.PRODUCTS</DESCRIPTION>          -<PARENT>            <NAME>DRUGWAREHOUSE</NAME>           <TYPE>DATAWAREHOUSE</TYPE>            <ID>13119</ID>           <DESCRIPTION>DRUGDEPO_WH.DRUGWAREHOUSE</DESCRIPTION>          </PARENT>           <DEPENDENCY Impact=“NONE”Description=“NONE”            Complete=“NO” /> 38C          </OBJECT>        </DEPENDENCY>        </OBJECT>       </DEPENDENCY>     </OBJECT>     </DEPENDENCY>    </OBJECT>   </DEPENDENCY>  </TREE>

APPENDIX C <?xml version=“1.0” encoding=“UTF-8” ?> - <TREExmlns:xsi=“http://www.w3.org/2000/10/XMLSchema-instance”  xsi:noNamespaceSchemaLocation=“lia.xsd” CreateDate=“”UpdateDate=“24-OCT-02”   Description=“unknown” LIAType=“lineage”>  -<DEPENDENCY Impact=“NONE” Description=“NONE” Complete=“YES”>   -<OBJECT>     <NAME>GEOGRAPHIES</NAME>     <TYPE>DIMENSION</TYPE>    <ID>13483</ID>    <DESCRIPTION>DRUGWAREHOUSE.GEOGRAPHIES</DESCRIPTION>    - <PARENT>     <NAME>DRUGWAREHOUSE</NAME>      <TYPE>DATAWAREHOUSE</TYPE>     <ID>13119</ID>     <DESCRIPTION>DRUGDEPO_WH.DRUGWAREHOUSE</DESCRIPTION>     </PARENT>   - <DEPENDENCY Impact=“NONE” Description=“NONE” Complete=“YES”>     -<OBJECT>       <NAME>COUNTRY_LK</NAME>       <TYPE>TABLE</TYPE>      <ID>13189</ID>      <DESCRIPTION>DRUGWAREHOUSE.COUNTRY_LK</DESCRIPTION>      -<PARENT>        <NAME>DRUGWAREHOUSE</NAME>       <TYPE>DATAWAREHOUSE</TYPE>        <ID>13119</ID>       <DESCRIPTION>DRUGDEPO_WH.DRUGWAREHOUSE</DESCRIPTION>      </PARENT>       <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES” />      - <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES”>       - <OBJECT>         <NAME>COUNTRYLK_TXT</NAME>        <TYPE>FILE</TYPE>         <ID>13109</ID>        <DESCRIPTION>SOURCEFILES.COUNTRYLK_TXT</DESCRIPTION>        -<PARENT>          <NAME>SOURCEFILES</NAME>         <TYPE>FILE_MODULE</TYPE>          <ID>13102</ID>         <DESCRIPTION>DRUGDEPO_WH.SOURCEFILES</DESCRIPTION>        </PARENT>         <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=          “YES”/>        </OBJECT>       </DEPENDENCY>      -<DEPENDENCY Impact=“NONE” Description=“NONE” Complete=“YES”>       -<OBJECT>         <NAME>CUSTOMERS</NAME>         <TYPE>TABLE</TYPE>        <ID>13618</ID>        <DESCRIPTION>ORACLE_ORDER_SYSTEM.CUSTOMERS</DESCRIPTION>       - <PARENT>          <NAME>ORACLE_ORDER_SYSTEM</NAME>         <TYPE>PACKAGED_MODULE</TYPE>          <ID>13567</ID>         <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>        </PARENT>         <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=          “YES”/>        </OBJECT>       </DEPENDENCY>     </OBJECT>     </DEPENDENCY>    - <DEPENDENCY Impact=“NONE”Description=“NONE” Complete=“YES”>     - <OBJECT>      <NAME>SALES_REGIONS</NAME>       <TYPE>TABLE</TYPE>      <ID>13568</ID>      <DESCRIPTION>ORACLE_ORDER_SYSTEM.SALES_REGIONS</DESCRIPTION>     - <PARENT>        <NAME>ORACLE_ORDER_SYSTEM</NAME>       <TYPE>PACKAGED_MODULE</TYPE>        <ID>13567</ID>       <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>      </PARENT>       <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES” />      - <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES”>       - <OBJECT>         <NAME>CUSTOMERS</NAME>        <TYPE>TABLE</TYPE>         <ID>13618</ID>        <DESCRIPTION>ORACLE_ORDER_SYSTEM.CUSTOMERS</DESCRIPTION>       - <PARENT>          <NAME>ORACLE_ORDER_SYSTEM</NAME>         <TYPE>PACKAGED_MODULE</TYPE>          <ID>13567</ID>         <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>        </PARENT>         <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=          “NO”/>        </OBJECT>       </DEPENDENCY>     </OBJECT>     </DEPENDENCY>    - <DEPENDENCY Impact=“NONE”Description=“NONE” Complete=“YES”>     - <OBJECT>      <NAME>STATE_TAX</NAME>       <TYPE>TABLE</TYPE>      <ID>13573</ID>      <DESCRIPTION>ORACLE_ORDER_SYSTEM.STATE_TAX</DESCRIPTION>      -<PARENT>        <NAME>ORACLE_ORDER_SYSTEM</NAME>       <TYPE>PACKAGED_MODULE</TYPE>        <ID>13567</ID>       <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>      </PARENT>       <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES” />      - <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“YES”>       - <OBJECT>         <NAME>CUSTOMERS</NAME>        <TYPE>TABLE</TYPE>         <ID>13618</ID>        <DESCRIPTION>ORACLE_ORDER_SYSTEM.CUSTOMERS</DESCRIPTION>       - <PARENT>          <NAME>ORACLE_ORDER_SYSTEM</NAME>         <TYPE>PACKAGED_MODULE</TYPE>          <ID>13567</ID>         <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>        </PARENT>         <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=           “NO”/>        </OBJECT>       </DEPENDENCY>     </OBJECT>     </DEPENDENCY>    - <DEPENDENCY Impact=“NONE”Description=“NONE” Complete=“YES”>     - <OBJECT>      <NAME>CUSTOMERS</NAME>       <TYPE>TABLE</TYPE>      <ID>13618</ID>      <DESCRIPTION>ORACLE_ORDER_SYSTEM.CUSTOMERS</DESCRIPTION>      -<PARENT>        <NAME>ORACLE_ORDER_SYSTEM</NAME>       <TYPE>PACKAGED_MODULE</TYPE>        <ID>13567</ID>       <DESCRIPTION>DRUGDEPO_WH.ORACLE_ORDER_SYSTEM</DESCRIPTION>      </PARENT>       <DEPENDENCY Impact=“NONE” Description=“NONE”Complete=“NO” />      </OBJECT>     </DEPENDENCY>    </OBJECT>  </DEPENDENCY>  </TREE>

What is claimed is:
 1. A method comprising: receiving, at one or morecomputer systems, a change in a definition of a first object describinga first portion of data stored in a data warehouse; determining, withone or more processors associated with the one or more computer systems,at least one task to propagate the change in the definition of the firstobject to metadata of the first object; reconciling, with one or moreprocessors associated with one or more computer systems, lineage of thefirst object to identify a set of objects that are impacted by thechange to the definition of the first object, each object in the secondset of objects describing a second portion of the data stored in thedata warehouse different from the first portion of the data;determining, with the one or more processors associated with the one ormore computer systems, at least one additional task to propagate animpact of the change to metadata of one or more objects in the set ofobjects that are impacted by the change to the definition of the firstobject; and rendering, with the one or more processors associated withthe one or more computer systems, a visual representation of the impactof the change to the metadata of the one or more objects within agraphical user interface.
 2. The method of claim 1 further comprisinggenerating a script of all tasks needed to propagate the impact of thechange to the metadata of the one or more objects.
 3. The method ofclaim 2 further comprising executing the script in response to a commandprovided by a user via the user interface.
 4. The method of claim 2wherein reconciling, with the one or more processors associated with oneor more computer systems, the lineage of the first object comprisesgenerating a graph based on relationships between the first object andeach object in the set of objects.
 5. The method of claim 4 whereinrendering, with the one or more processors associated with the one ormore computer systems, the visual representation of the impact of thechange to the metadata of the one or more objects within the graphicaluser interface comprises rendering the graph depicting the relationshipsbetween the first object and each object in the set of objects.
 6. Themethod of claim 1 wherein reconciling, with the one or more processorsassociated with one or more computer systems, the lineage of the firstobject comprises performing a lineage analysis within an impactanalysis.
 7. The method of claim 1 further comprising generating updatedversions of the first object and each object in the set of objects basedupon completion of a set of tasks.
 8. The method of claim 1 whereinreconciling, with the one or more processors associated with one or morecomputer systems, the lineage of the first object to identify the set ofobjects comprises managing hierarchical dependencies among objects basedon dependency documents that conform to a common dependency format. 9.The method of claim 1 further comprising rendering a visualrepresentation of any determined tasks within the graphical userinterface.
 10. The method of claim 1 further comprising rendering one ormore suggestions within the graphical user interface based on anydetermined tasks.
 11. A non-transitory computer-readable medium storinga computer program product which when executed by a processor of acomputer system cause the processor to: receive a change in a definitionof a first object describing a first portion of data stored in a datawarehouse; determine at least one task to propagate the change in thedefinition of the first object to metadata of the first object;reconcile lineage of the first object to identify a set of objects thatare impacted by the change to the definition of the first object, eachobject in the second set of objects describing a second portion of thedata stored in the data warehouse different from the first portion ofthe data; determine at least one additional task to propagate an impactof the change to metadata of one or more objects in the set of objectsthat are impacted by the change to the definition of the first object;and render a visual representation of the impact of the change to themetadata of the one or more objects within a graphical user interface.12. The non-transitory computer-readable medium of claim 11 wherein thecomputer program product further causes the processor to generate ascript of all tasks needed to propagate the impact of the change to themetadata of the one or more objects.
 13. The non-transitorycomputer-readable medium of claim 12 wherein the computer programproduct further causes the processor to execute the script in responseto a command provided by a user via the user interface.
 14. Thenon-transitory computer-readable medium of claim 12 wherein to reconcilethe lineage of the first object the computer program product furthercauses the processor to generate a graph based on relationships betweenthe first object and each object in the set of objects.
 15. Thenon-transitory computer-readable medium of claim 14 wherein to renderthe visual representation of the impact of the change to the metadata ofthe one or more objects within the graphical user interface the computerprogram product further causes the processor to render the graphdepicting the relationships between the first object and each object inthe set of objects.
 16. The non-transitory computer-readable medium ofclaim 11 wherein to reconcile the lineage of the first object thecomputer program product causes the processor to performing a lineageanalysis within an impact analysis.
 17. The non-transitorycomputer-readable medium of claim 11 wherein the computer programproduct further causes the processor to generate updated versions of thefirst object and each object in the set of objects based upon completionof a set of tasks.
 18. The non-transitory computer-readable medium ofclaim 11 wherein to reconcile the lineage of the first object toidentify the set of objects the computer program product causes theprocessor to managing hierarchical dependencies among objects based ondependency documents that conform to a common dependency format.
 19. Themethod of claim 1 further comprising rendering one or more suggestionswithin the graphical user interface based on any determined tasks.
 20. Asystem comprising: a hardware processor; and a memory storing a set ofinstructions which when executed by the processor cause the processorto: receive a change in a definition of a first object describing afirst portion of data stored in a data warehouse; determine at least onetask to propagate the change in the definition of the first object tometadata of the first object; reconcile lineage of the first object toidentify a set of objects that are impacted by the change to thedefinition of the first object, each object in the second set of objectsdescribing a second portion of the data stored in the data warehousedifferent from the first portion of the data; determine at least oneadditional task to propagate an impact of the change to metadata of oneor more objects in the set of objects that are impacted by the change tothe definition of the first object; and render a visual representationof the impact of the change to the metadata of the one or more objectswithin a graphical user interface.